Mapping cell populations in flow cytometry data for cross‐sample comparison using the Friedman–Rafsky test statistic as a distance measure
نویسندگان
چکیده
Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell populations. FlowMap-FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F-measure of 0.88 was obtained, indicating high precision and recall of the FR-based population matching results. FlowMap-FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows.
منابع مشابه
A Multivariate Two-Sample Test Based on the Concept of Minimum Energy
We introduce a new statistical quantity the energy to test whether two samples originate from the same distributions. The energy is a simple logarithmic function of the distances of the observations in the variate space. The distribution of the test statistic is determined by a resampling method. The power of the energy test in one dimension was studied for a variety of different test samples a...
متن کاملQuadratic form: a robust metric for quantitative comparison of flow cytometric histograms.
Comparison of fluorescence distributions is a fundamental part of the analysis of flow cytometric data. This approach is applied to detect differences between control and test sample and thus analyze a biological response. Comparison of standard test samples over time provides an estimate of instrument stability for quality control. However, application of statistical methods of distribution co...
متن کامل@bullet Some Properties of the Two-sample Multidimensional Runs Statistic S<j.1e Properties of Tile Dijo-sample Multidimensional Runs Statistic Advisor Reader -- .- List of Ficljres List of Appendices Iii Viii Xi Xiii Chajyfer I
sional Runs Statistic. (Under the direction of DANA QUADE.) Friedman and Rafsky have generalized the runs statistic presented by Wald and Wolfowitz for testing the homogeneity of two populations to include the comparison of multivariate populations. Their multi-dimensional runs statistic can be defined as the number of links between observations from different populations, where observations ma...
متن کاملGenetic analysis of six sterlet (Acipenser ruthenus) populations - recommendations for the plan of restitution in the Dniester River
The aim of the present study was the genetic analysis of the Dniester population of sterlet Acipenser ruthenus and comparison of it to five other sterlet populations, in order to develop a population recovery plan. The genetic analysis of six sterlet populations from Eurasian rivers (Dniester, Dnieper, Danube, Volga, Kama and Ob) was carried out using microsatellite DNA markers. The genetic var...
متن کاملP-95: Flow Cytometry Analysis of Bovine Semen:A Qualitative Study
Background: Although AI practices have been introduced little over 60 years, the success rate remains relatively low. This might be due to the exclusive selection of semen based on motility analysis. Recent advancement in sperm sexing using flow cytometry with an increased throughput from next generation cell sorters, made use of this technology in studding sperm qualitative aspects other than ...
متن کامل